02-06/12/2019

Day 1

Plan for the day

Morning

  • Arsenio: Introduction and week plan
  • Ben: What is R? Why use R?
  • Joe: The learning curve and motivation
  • Arsenio: Preparing the environment

Afternoon

  • Arsenio: Preparing the data (basic only, save databases, etc. for day 5)
  • Ben: Introduction to ggplot2 (context, rationale, structure)
  • Joe: First chart

Introduction and week plan (Arsenio)

  • Main goals for the course
  • Main goals for each day

What is R? (Ben)

  • Why use it?
    • Intuitive, powerful, and popular.
    • Open source: it’s free and maintained and upadated by programmers all over the world.

<<<<<<< HEAD ## What is R? (Ben)

  • Object oriented programming language -Designed to make data handling and statitics intuitive.

  • Why use it?
    • Intuitive, powerful, and popular.
    • Open source: it’s free and maintained and upadated by programmers all over the world.

There is a package for that!

There is a package for that!

There is a package for that!

There is a package for that!

There is a package for that!

There is a package for that!

There is a package for that!

There is a package for that!

There is a package for that!

There is a package for that!

R is fast

R is beautiful

R is beautiful

R is beautiful

R is beautiful

R is beautiful

R is beautiful

The learning curve (Joe)

  • Motivation
  • Expectations
  • Active, engaged learning

The learning curve (Joe)

The learning curve (Joe)

The learning curve (Joe)

The learning curve (Joe)

The learning curve (Joe)

## Warning: Removed 450 rows containing missing values (geom_path).

The learning curve (Joe)

Preparing the environment (Arsenio)

  • Installing R
  • Installing RStudio
  • Understanding the interface / the 4 panels
    • The scripts.
    • The console.
    • The global environment.
    • The plots.

Preparing the environment

Installing R and RStudio

For this training, we will use:

  1. R – a free software environment for statistical computing and graphics (download and install from https://cloud.r-project.org/)
  2. RStudio – an integrated development environment for R (download and install from https://rstudio.com/products/rstudio/download/)
  3. Packages – extend the capabilities of R. To install a package, open RStudio then
    • Packages –> Install –> type the package name in Packages –> Install or
    • Type install.packages("package_name") then hit Enter key
      • e.g.: install.packages(“plotly”)

Packages (Arsenio)

Preparing the environment

Installing the necessary packages

Run the following code to install all the necessary packages

How to use/load a package (Arsenio)

Afternoon

Preparing the data (Arsenio)

  • Exploring packages and functions to read in data.
  • Cleaning column names, removing missing values, and formatting variables.
  • Preparing the data for analysis and visualization.

Introduction to ggplot2 (Ben)

  • The advantages of ggplot.
  • Quick, simple, and beautiful.
  • Exploring the basics as well as showing the potential.

The advantages

The advantages

  • Its part of a pipeline of well maintained packages.
  • Tidyverse = readr -> dplyr -> ggplot2

Quck and simple

Quck and simple

Quck and simple

the potential

{width = 100%, height = 100%}

the potential

the potential

the potential

First chart (Joe)

First chart (Joe)

## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

First chart (Joe)

First chart (Joe)

First chart (Joe)

Preparing the data

Preparing the data

Importing data from files (Arsenio)

Different packages/functions to import data from different file formats

Preparing the data

Importing data from files (Arsenio)

Swiss-army knife for data import/export: rio

Preparing the data

Importing data from databases management systems (Arsenio)

Different packages for different DBMS, e.g.:

  • RODBC – implements ODBC, an API for accessing DBMS
  • RMySQL/RMariaDB – for accessing MySQL/MariaDB databases
  • ROracle – for accessing Oracle database
  • RPostgreSQL – for accessing PostgreSQL database
  • RSQLite – for accessing SQLite database
  • RMongo – for accessing MongoDB database

Preparing the data

Importing data from databases management systems (Arsenio)

E.g.: Importing data from MariaDB

Preparing the data

Cleaning data (Arsenio)

  • Often the most time-consuming part of any data analysis
  • Some useful functions:
Package Function Use
dplyr select select variables/columns
dplyr filter select observations/rows
dplyr mutate transform or recode variables
dplyr summarize summarize data
dplyr group_by identify subgroups for further processing
tidyr gather convert wide format dataset to long format
tidyr spread convert long format dataset to wide format

Preparing the data

Tranforming data (Joe)

(Interactive activity)

Preparing the data

Tranforming data (Joe)

Preparing the data

Tranforming data (Joe)

  • Create an object called gmx. This will be gm, but we will filter to include only the most recent year (2007).

Preparing the data

Tranforming data (Joe)

  • Create an object called gmx. This will be gm, but we will filter to include only the most recent year (2007).

Preparing the data

Tranforming data (Joe)

  • Create an object called gmx. This will be gm, but we will filter to include only the most recent year (2007).

Preparing the data

Tranforming data (Joe)

  • Make an object called ncountries. To do this, group the data by continent and tally the number of countries

Preparing the data

Tranforming data (Joe)

  • Make an object called ncountries. To do this, group the data by continent and tally the number of countries

Preparing the data

Tranforming data (Joe)

  • Arrange ncountries from lowest to highest

Preparing the data

Tranforming data (Joe)

  • Arrange ncountries from lowest to highest

Preparing the data

Tranforming data (Joe)

  • Make a plot of the continents (x-axis) and the number of countries (y-axis)

Preparing the data

Tranforming data (Joe)

  • Make a plot of the continents (x-axis) and the number of countries (y-axis)

Preparing the data

Tranforming data (Joe)

  • Make a plot of the continents (x-axis) and the number of countries (y-axis)

Preparing the data

Tranforming data (Joe)

  • Make a plot of the continents (x-axis) and the number of countries (y-axis)

Preparing the data

Tranforming data (Joe)

  • Let’s create an object called moz. This will be the gapminder data, but just for Mozambique.

Preparing the data

Tranforming data (Joe)

  • Let’s create an object called moz. This will be the gapminder data, but just for Mozambique.

Preparing the data

Tranforming data (Joe)

  • Plot GDP per capita over time

Preparing the data

Tranforming data (Joe)

  • Plot life expectancy over time

Preparing the data

Tranforming data (Joe)

  • Plot the association between gdpPercap and life expectancy

Preparing the data

Tranforming data (Joe)

  • Plot the association between gdpPercap and life expectancy

Preparing the data

Tranforming data (Joe)

  • Take the gapminder data, keep only 2007 data, group by continent, and get the maximum and minimum life expectancy for each continent

Preparing the data

Tranforming data (Joe)

  • Take the gapminder data, keep only 2007 data, group by continent, mutate a new variable with the average GDP for that continent, mutate another variable with the difference between each country and its continent’s average GDP

Discussion and activities

  • How else can we visualize the gapminder data?

Day 2

Today’s agenda

Morning

  • Arsenio: Univariate graphs - Categorical
  • Ben: Univeriate graphs - Quantitative
  • Joe: Univariate graphs - Exercises

Afternoon

  • Ben: Bivariate graphs - Introduction
  • Arsenio: Bivariate graphs|Categorical vs Categorical
  • Joe: Bivariate graphs|Quantitative vs Quantitative
  • Ben: Bivariate graphs|Categorical vs Quantitative
  • Arsenio: Bivariate graphs - Exercises

Univariate graphs

Categorical (Arsenio)

Univariate graphs

Quantitative (Ben)

Bivariate graphs

Bivariate graphs

Introduction (Ben)

Bivariate graphs

Categorical vs Categorical (Arsenio)

Bivariate graphs

Quantitative vs Quantitative (Joe)

Bivariate graphs

Categorical vs Quantitative (Ben)

Bivariate graphs

Exercises (Arsenio)

Day 3

Today’s agenda

Morning

  • Ben: Multivariate graphs - Introduction
  • Joe: Multivariate graphs - Preparing the data (dplyr, grouping, etc.).
  • Arsenio: Multivariate graphs - Practical instruction
  • All: Multivariate graphs - Exercises

Afternoon

  • Joe: Mapping - Introduction
  • Arsenio: Mapping - Point maps / dot density maps
  • Ben: Mapping - Choropleth maps
  • All: Mapping - Exercises

Multivariate graphs

Multivariate graphs

Introduction (Ben)

Multivariate graphs

Data preparation (Joe)

Multivariate graphs

Practical instruction (Arsenio)

Multivariate graphs

Exercises (all)

Maps

Maps

Introduction (Joe)

Maps

Dot density maps (Arsenio)

Maps

Choropleth maps (Ben)

Maps

Exercises (all)

Day 4

Today’s agenda

Morning

  • Ben: Time-dependent maps - Introduction
  • Joe: Time-dependent maps - Time series
  • Arsenio: Time-dependent maps - Dumbbell, slope, area charts
  • All: Time-dependent maps - Exercises

Afternoon

  • Arsenio: Statistical models - Introduction
  • Ben: Statistical models - Correlation plots
  • Arsenio: Statistical models - Linear regression
  • Joe: Statistical models - Logistic regression
  • Ben: Statistical models - Mosaic plots
  • Ben: Statistical models - Survival plots
  • All: Statistical models - Exercises

Time-dependent graphs

Time-dependent graphs

Introduction (Ben)

Time-dependent graphs

Time series (Joe)

Time-dependent graphs

Dummbbell charts (Arsenio)

Time-dependent graphs

Slope graphs (Arsenio)

Time-dependent graphs

Area charts (Arsenio)

Time-dependent graphs

Exercises (all)

Statistical models

Statistical models

Introduction (Arsenio)

Statistical models

Correlation plots (Ben)

Statistical models

Linear regression (Arsenio)

Statistical models

Logistic regression (Joe)

Statistical models

Mosaic plots (Ben)

Statistical models

Survival plots (Ben)

Statistical models

Exercises (all)

Day 5

Today’s agenda

Morning

  • Arsenio: Other graphs - Introduction
  • Joe: Interactive graphs - Overview
  • Ben: Customizing graphs - Tips
  • Joe: Saving graphs - Overview
  • Arsenio: Advice

Afternoon

  • Visualization competition
  • Extra / personal project help time

Other graphs

Other graphs

3-D scatterplot (Joe)

Other graphs

Biplots (Arsenio)

Other graphs

Bubble charts (Ben)

Other graphs

Flow diagrams (Joe)

Other graphs

Heatmaps (Joe)

Other graphs

Radar charts (Arsenio)

Other graphs

Scatterplot matrix (Arsenio)

Other graphs

Waterfall charts (Ben)

Other graphs

Word clouds (Joe)

Customizing graphs

Customizing graphs

Axes (Ben)

Customizing graphs

Colours (Joe)

Customizing graphs

Points and lines (Arsenio)

Customizing graphs

Legends (Ben)

Customizing graphs

Labels (Joe)

Customizing graphs

Annotations (Arsenio)

Customizing graphs

Themes (Ben)

Customizing graphs

BBC-like (bbplot) (Arsenio)

Saving graphs

Saving graphs

Via menus (Arsenio)

Saving graphs

Via code (Joe)

Saving graphs

File formats

Saving graphs

External editing

Interactive graphs

Interactive graphs

leaflet (Joe)

Interactive graphs

plotly (Ben)

Interactive graphs

rbokeh (Arsenio)

Interactive graphs

rCharts (Joe)

Interactive graphs

highcharter (Arsenio)

Advice

Advice

Labeling (Arsenio)

Advice

Signal to noise ratio (Joe)

Advice

Color choice (Arsenio)

Advice

y-Axis scaling (Ben)

Advice

Attribution (Arsenio)

Advice

Going further (Joe)

Advice

Final note (Arsenio)

References

References

  • Kabacoff, R. (2018). Data Visualization with R. QAC
  • Moulik, T. (2018). Applied Data Visualization with R and Ggplot2. Packt Publishing